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A Decision-Theoretic Comparison of Treatments to Resolve Air 
Leaks After Lung Surgery Based on Nonparametric Modeling 

Yanxun xiflB Peter F. ThalQ Peter MiilleiQ and Mehran J. Rezej^ 

Abstract 

We propose a Bayesian nonparametric utility-based group sequential design 
for a randomized clinical trial to compare a gel sealant to standard care for re¬ 
solving air leaks after pulmonary resection. Clinically, resolving air leaks in the 
days soon after surgery is highly important, since longer resolution time produces 
undesirable complications that require extended hospitalization. The problem of 
comparing treatments is complicated by the fact that the resolution time distribu¬ 
tions are skewed and multi-modal, so using means is misleading. We address these 
challenges by assuming Bayesian nonparametric probability models for the resolu¬ 
tion time distributions and basing the comparative test on weighted means. The 
weights are elicited as clinical utilities of the resolution times. The proposed design 
uses posterior expected utilities as group sequential test criteria. The procedure’s 
frequentist properties are studied by extensive simulations. 

KEY WORDS: Bayesian nonparametric; Clinical trial; Mesothelioma; Utility 
function 

1 Introduction 

1.1 The motivating clinical trial 

Intraoperative air leaks (lALs) occur in 48 to 75% of patients after pulmonary resection 

Despite the routine use of intraoperative su¬ 
tures and stapling devices, lALs remain a significant problem in the practice of thoracic 

^Dept. of Statistics and Data Sciences, University of Texas at Austin 
^Dept. of Applied Mathematics and Statistics, Johns Hopkins University 
^Dept. of Biostatistics, University of Texas, M.D. Anderson Cancer Center 
^Dept. of Mathematics, University of Texas at Austin, pmueller@math.utexas.edu 
®Dept. of Thoracic Surgery, The University of Texas M.D. Anderson Cancer Center 


(Serra-Mitjans and Belda-Sanchis, 2005 


1 






surgery. lALs that persist beyond the immediate postoperative period of hve days may 
result in longer chest tube drainage, greater postoperative pain, increased risk of infec¬ 


tion, empyema, thromboemboli, and increased length of hospitalization (Merritt et al. 


2010 [Singhal et ah, 2010). Air leaks are a particularly severe problem in patients with 


emphysematous lungs or who have undergone extensive visceral pleural denuding pro¬ 
cedures, such as pleurectomy decortication. This is a surgical procedure in which the 
lining surrounding one lung hrst is removed (pleurectomy), and then any tumor masses 
that are growing inside the chest cavity are removed (decortication). In addition to the 
noted risks to the patient, the economic impact of a prolonged air leak is signihcant, 
primarily due to increased hospital stay. Because the standard procedure of suturing 
visible leaks and using staple reinforcement gives unpredictable results, an alternative 
technique to control lALs is the use of liquid sealants, which are thick fluids instilled 
in the areas of leaks. Progel (Neomend, Inc., Irvine, CA) is a polymeric biodegradable 
hydrogel sealant, that currently is the only FDA approved sealant to control lALs during 


pulmonary resection (Kobayashi et al., 2001). 

Despite FDA approval, the true beneht of Progel in reducing the rate of occurrence or 
duration of lALs in lung resection patients has not been established, and therefore it is 
not used routinely. Researchers have conducted two studies comparing Progel (treatment 
group) with standard care (control group) to demonstrate the safety and efficacy of Progel 
(Allen et ah, 2004, [Klijian ||2012| ). Because the study of Allen et al. (2004) varied the 
application of Progel based on the size of the air bubbles seen in each patient, and the 
precise methodology of how this was done was not explained in sufficient detail to enable 
replication, the results of this trial are of limited use for a general comparison of Progel 


to standard care. The study of Klijian (2012) was retrospective and not randomized. 
Given these limitations of existing data, the desire to obtain a prospective randomized 
comparison of Progel to standard care motivated the clinical trial described in this paper. 
The trial has passed IRB (internal review board) approval and is scheduled to start 
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accrual at The University of Texas M.D. Anderson Cancer Center. 

1.2 Modeling considerations 

Denote by T the days to resolve lALs, allowing the possibility that an air leak may not 


develop, represented by T = 0. Allen et ah (2004) and Klijian (2012) compared Hq and pi, 
the means of T in a control and treatment group, respectively, using a standard t-test, and 
concluded that Progel was superior to standard care in reducing lALs. Figure [^plots the 
histogram of T obtained from non-randomized historical data in the clinical database 
of the Department of Thoracic and Cardiovascular Surgery at M.D. Anderson Cancer 
Center. The histogram suggests that a standard parametric model is inappropriate to 
describe air leak resolution time distributions. For example, a normal or log normal 
distribution would fail to allow for the observed multi-modality and late resolution times. 
Moreover, some patients treated with Progel after resection may be free of air leaks 
immediately following surgery, corresponding to a positive probability mass at T = 0. 

Let Gi denote the distribution of T in the treatment (Progel) group and Gq the 
distribution of T with the control (standard care). We will represent each Gj, j = 0,1 
as a mixture of a point mass at 0 and a hypothesized distribution Mj for non-zero 
resolution times, with Mi a left-shifted version of Mq to formalize the assumption that, 
stochastically, lAL resolution times with Progel are no longer than with standard care. 
This order constraint is motivated by several medical considerations: Progel is inert, and 
thus it cannot react chemically with the patient’s lung tissue, is not a potential source of 
infection, and does not slow down the healing process. Moreover, Progel cannot make an 
air leak worse because it does not contribute to air leak formation. These considerations 
motivate a priori stochastic ordering of Gi and Gq, which effectively says that, in terms 
of time to resolve an lAL, Progel may be better than standard care, but it cannot be 
worse. Nevertheless, for comparison we later report also inference under an otherwise 
equivalent model without the stochastic ordering constraint. 

An important consideration in developing a trial design is that the use of an expected 
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Figure 1: Histogram of times to resolve lung air leaks, from the historical dataset. 

value as the target for a comparative test is inappropriate and inadequate, both because 
the historical distribution is skewed with a long right tail, and because a change in the 
early days after surgery is clinically more important than a comparable change in later 
days. Also, a standard test of /ii = /io versns /ii < /xq wonld reqnire an impractically 
large sample size to achieve any reasonable power. These complications are the principal 
reasons why designing a randomized trial to compare Progel to standard care is non¬ 
trivial, and why the use of Progel has not been widely accepted among surgeons who 
perform pnlmonary resections. 

The desire to obtain reliable conhrmatory evidence to evalnate the comparative ben- 
eht of Progel motivates the randomized trial described in this paper. The goal of the 
trial is to assess the extent to which Progel is snperior to standard care. The comparison 
also allows for the possibility no difference. 

1.3 Stochastic ordering and Bayesian nonparametric priors 

The time nntil resolntion of air leaks for patients treated with Progel is a priori expected 
to be shorter than nnder standard care. This introdnces a stochastic ordering constraint 
on Go and Gi. Formally, a distribution Gi is stochastically smaller than Gq, denoted by 
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Gi ^ Go, if the corresponding cumulative distribution functions satisfy Fi{t) > F^it) for 
all t. Lehmann and Romano (2006) and Randles and Wolfe ( |1979 ) have modeled stochas¬ 
tic ordering parametrically. Although straightforward, these approaches are limited by 
the requirement that a parametric family must be specified. 

To compare distributions of air leak resolution times, detailed features (e.g., skewed 
or multi-modal) of the distributions are important, leading us to consider a Bayesian 
nonparametric (BNP) approach. Importantly, uncertainties about the inference on these 
details are critical, as posterior probabilities about comparisons drive the decisions about 
sequential continuation and the terminal decision. Such descriptions of uncertainties are 
best considered in the framework of a probability model on the unknown distributions, 
as they are implemented in BNP models. 

Formally, BNP refers to prior models for infinite dimensional unknown quantities. 
Inference for random distributions, like Go and Gi here, is a typical example. A common 
feature of BNP models is their large support, which allows one to approximate essen¬ 
tially arbitrary distributions (Ishwaran and Jamesf 2001). For the proposed design, we 


use a model based on the Dirichlet process (DP) prior (Ferguson, 1973), which is by far 
the most commonly used BNP model for a random distribution. [MacEachern (1999) 
introduced the dependent DP (DDP), which extended the DP to a probability model 
for a family {G^,, x G W} of random probability measures, indexed by some covariate 
X. The special case of a finite family, like {Go,Gi} in our application, was discussed in 


De lorio et ah (2004). Several authors have considered BNP models for stochastically or¬ 


dered distributions. Gelfand and Kottas (2001) started with two DP random probability 
measures Go and Gi, and used the product of the corresponding cumulative distribution 
functions to define a pair of stochastically ordered random probability measures. A gen¬ 
eral methodology for stochastic ordering by considering probability measures constrained 


to a convex set was proposed by Hoff (2003). Finally, Dunson and Peddada (2008) incor¬ 
porated stochastic ordering constraints in the DDP prior. In this paper, we use a simple 
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implementation of this finite DDP model with order constraint as the prior probability 
model for the distributions Gi and Go of leak resolution times under Progel and control. 
Details are discussed in Section For more extensive reviews of BNP methods, see, for 


example, Hjort et ah (2010). 

Based on the proposed BNP model, we dehne a utility-based decision criterion to 
develop a clinical trial design. To our knowledge, there is no literature on using BNP 
stochastic ordering models in construction of clinical trial designs. The main novelties 
in the proposed approach are the successful use of utilities in a small scale clinical trial, 
a convincing case for the need of a full probabilistic description of uncertainties on ran¬ 
dom probability measures, and a simple and practicable construction of a BNP prior on 
stochastically ordered random probability measures with point masses. The trial will be 
conducted in The University of Texas M.D. Anderson Cancer Center, with a co-author 
of this paper (RM) its Principal Investigator. 

Important practical advantages of the proposed approach are that it allows meaningful 
borrowing of information from historical data (by centering the BNP model), borrowing 
across treatments (by constructing correlated priors on Go and Gi), and the exploitation 
of stochastic ordering constraints, if warranted and approved in IRB reviews. The use of 
utility weighting for the outcomes, as in this application, is particularly natural under a 
BNP model because it allows inference about all aspects of the event time distribution, 
without constraint to parametric families. Together, these features allow the investigators 
to plan a much smaller sample size than what would be required by a conventional trial 
design. For example, based on the historical mean of 8 days and standard deviation 8.76, 
a two sample one-sided 0.05-level f-test with power 0.80 to detect a 25% drop in the 
mean, from 8 to 6 days, would require a sample of n = 476 patients. This is impossible 
for this single-institution trial. Given the realistic maximum target accrual of 48 patients, 
the question is whether a design can be constructed that has reasonably high power to 
conclude that Progel is superior to standard care under clinically meaningful alternatives. 
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T (days) 

0 5 10 15 20 25 30 35 > 40 

Utility 

100 50 10 6 5 4 3 2 0 


Table 1: Elicited utility u{T) for T = days to resolve intra-operative air leak. 


We will show that the proposed design, based on differences in mean utilities evaluated 
from the posterior under the BNP model, has very desirable operating characteristics 
with n = 48 patients. The impact is the opportunity to establish what is expected to be 
a greatly superior treatment option for patients, with reasonable cost and effort. 


2 Utilities and Trial Design 

The primary outcome is T, the time (in days) to resolve an air leak in the lungs following 
surgery, and we dehne Y = log(T -|- 1). The possibility that an air leak may not develop 
is represented by T = Y = 0. The mean, or any other single measure of central tendency 
of E, is not an appropriate summary for treatment comparisons. Instead, we take a 
utility-based approach. Utility-based decision criteria have been used recently in clinical 


trial designs (Thall and Nguyen, 2012 Lee et ah, 2015). We use utilities to weight 
the importance of air leak resolution times after surgery. For example, a difference of 
a few days in time to resolution of air leaks in the days immediately after surgery is 
far more important than a comparable difference in later days. We performed a formal 
utility elicitation with our clinical collaborator RM. The rationale of the utility elicitation 
includes: 1) the most desirable resolution time is T = 0 (free of air leaks immediately 
after surgery, although this ideal outcome is almost never seen with standard care); 2) 
early (1 < T < 5) resolution of air leaks is very desirable and therefore the interval [1, 5] 
received a relatively high utility; 3) the utilities drop off steeply for later resolution times 
(T > 5). These considerations are both medical and economic, and they motivated the 
elicited utilities u{Y) for Y = log(T-|- 1) in TableIn the table, the numerical utility 50 
assigned to the outcome T = 5 days corresponds to the subjective assessment of RM that 
this comparatively favorable outcome is half as desirable as the ideal outcome of having 
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no air leak at all. Similarly, the utility for T = 10 days reflects that this outcome, which 
involves a long hospital stay and the complications described earlier, is 1/5 as desirable 
as the outcome T = 5 days. We now are ready to define the expected utility for each 
group as 


Uj = / u(Y)G,(dY), J = 0,1. 


( 2 . 1 ) 


where Gj is a sampling model for the outcome in treatment group j. This expectation is 
over the distribution of the outcome T, and is conditional on the unknown distribution 
Gj. We do not need to make any specific assumptions about Gj yet, except for the 
existence of such a distribution. 

Based on the probability model and the utilities of Table we now dehne a design 
for the Progel trial. There are two types of decisions to be considered. At each interim 
test in the group sequential procedure, we make a stopping decision di G {0,1} to stop 
{di = 0) or continue {di = 1). If we reach a predetermined maximum sample size, N, 
we set djv = 0 by dehnition. Upon stopping, a terminal decision a G {0,1} reports the 
hnal recommendation, with a = 1 denoting a recommendation for Progel and a = 0 for 
standard care. A decision-theoretic optimal solution would require backward induction 


(Bellman, 1957) to solve the full sequential decision problem. We stop short of carrying 


out this computationally prohibitive solution. Instead, we propose to conduct the trial 
as follows. 


Sequential stopping rule. Patients are enrolled in the trial sequentially in cohorts 
of size m = 16 until a maximum of A^ = 48 patients is reached or early stopping is 
indicated. All patients are randomized equally to control group and treatment group, 
with the restriction of perfect balance after each cohort, when the continuation decisions 
di are made. 

Denote Yn = {Yji, i = 1,..., n/2, j = 0,1}, the observed data for the hrst n treated 





patients, that is n/2 patients in each gronp under the restricted equal randomization 
(rounding n/2 for odd n). The proposed decision criterion is the posterior probability 

ri{eu,Yn) = p{Ui> Uo +eu \Yn), (2.2) 

where ej/ > 0 is a minimum clinically meaningful difference in expected utility. Because 
the sequential rule makes multiple decisions, as with any group sequential procedure the 
decision boundaries must be calibrated to control the design’s overall false positive error 
rate. This is similar to the use of so-called alpha-spending functions in conventional fre- 
quentist group sequential designs. Like other frequentist summaries, false positive error 
rate (type-I error) is a probability under an assumed truth, with respect to repeated 
simulations of the entire trial. In the context of clinical trial designs such summaries un¬ 
der repeated simulations are also known as (frequentist) operating characteristics (OCs). 
Because evaluating the design’s OCs analytically is far too complex, we do this by re¬ 
peated computer simulations of the design, under an array of different possible scenarios. 
This follows routine practice in evaluating the behavior of sequentially adaptive clinical 
trial designs. In the present setting, the OCs are the type I error, mean sample size, and 
probabilities of different possible decisions (correct decision, stop due to futility, stop due 
to superiority). Details are reported in Tables]^ and SI. 

After each cohort, we carry out Markov chain Monte Carlo posterior simulation and 
evaluate the posterior estimates p{eu,Yn). Let be an upper probability boundary 
for which the trial will be terminated early and the treatment arm declared superior 
if rj{eu, Yn) > ^u- Similarly, let be a lower boundary for which the trial will be 
terminated early due to futility, with the null hypothesis accepted, if 7]{eu,Yn) < 

The bounds and are chosen by preliminary simulations to obtain a design with 
desirable frequentist OCs. In Section we will illustrate how one may calibrate these 
bounds. In summary, the sequential stopping decision at any point of the trial is: dn = I 
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if < vi(^u, Yn) < dn = 0 Otherwise. 


Terminal decision rule. Upon stopping, we record the terminal decision a = 1 if 
Yn) > ^ and a = 0 otherwise. Assuming that < 0.5 < fhe rule simply 
records whether we stop due to crossing either the upper or lower bound, respectively. If 
the trial reaches the maximum number of patients, N = 48, the terminal decision uses 
the threshold ^ > 0.5 to determine a recommendation for Progel. 

3 Probability Model 

3.1 Model and properties 

We now construct a prior probability model for Gj, the sampling model for Yij for patients 
under control (j = 0) and Progel (j = 1). Because some patients may be free of air leaks 
immediately following surgery, we allow a point mass at Yji = 0 by dehning j = 0,1, 
as mixtures 

OO OO 

Gj 4“ ^ ^ ^ ) ^jo^o 4“ (1 Pjo) ^ ^ {9a ) 

h=l h=l 

= i^joSo 4- (1 — Ujo)Mj, (3-1) 

where = ^- Also, we impose a constraint uio > z^oo on the probabilities z/io and 

z/qo, and Mi Y Mq, formalizing the prior belief that patients are more likely to be free of an 
air leak in the treatment group than in the control group. For Mj = ^2^=1 '^hN{6jh, cr^), 
j = 0,1, we use a DDP prior with common weights and dependent atoms. The common 
weights Wfi have the DP stick-breaking prior, = Vh n£<zi(^ with Vh ~ Beta(l, a). 
The dependent prior on the atoms is constructed as follows, to ensure Mi ^ Mq. We 
assume 0^ = {6oh, dih) ~ M*, where M* is a truncated multivariate normal base measure, 
including a positive probability k, for ties Oqu = Oih- 

M\0h) = N{ein I /xi, a?) (M(0oz^ = ^iz.) + (1 - «)iV+(0o/^ I r^)) , (3.2) 
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where N^{x \ m, V) refers to a truncated normal random variable x subject to x > m, 
and K = p{6oh = For comparison we will also consider inference under a variation of 


model (3.2) without the order constraint, replacing the iV+ kernel by an unconstrained 


normal N{9ih, r^). 

Denote Mj = where 69 .denotes a point mass at 9jh, j = 0,1. It is 

straightforward to show that Mi ^ Mq, which implies Mi ^ Mq, and this in turn 


implies Gi ^ Go, as desired. Barrientos et al. (2012) study the support properties of 


various DDP models. Applying Theorem 2 of Barrientos et al. (2012), it follows that 


the proposed model has full support over all pairs of stochastically ordered random 
probability measures. 

For reference we state the complete model. 


Yji I Vjo, Uji, Mo, Ml ~ Gj = Ujo5o + (1 - i^jo) ^ WhN{Yji \ 9jh, cr^) 


h=l 


{(^Oh, ^l/i) I hi, 


M* 


(3.3) 


We complete the model specihcation with choices for the hyperparameters z/^q, <7^, 

K,fii,ai,T. In the context of clinical trial design, the hyperparameters should not intro¬ 
duce inappropriately strong information into the prior. To ensure this, we provide the 
following guidelines. 

We hrst standardize the data by subtracting the sample mean Yi of the Tids of the 
treatment group and scaling with the sample standard deviation si, mapping Yji —)■ (Yj^ — 
Yi)/si- This is done to mitigate sensitivity to the measurement scale. We £x pi = 0 and 
(Ti = 1, to reflect the standardization. For we assume p(l/cr^) = Ga(0.001, 0.001) to 
ensure that the prior is not too informative, where Ga(a, h) denotes a gamma distribution 
with mean a/h. To allow for a wide range of shifts in the response density, we specify 
p(l/r^) = Ga(0.5, 0.5). This implies a Gauchy distribution for ^o/i, which often is used as 
a robust choice in parametric models. To satisfy the constraint vm > z/qo, we let Co = ^^oo. 
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Cl = ^10 ~ and assume ^(CojCi) = Dirichlet(0.1, 0.1, 0.1). Finally, we assume p{k) = 
Beta(l, 1) and p{a) = Ga(l, 1). The conjugacy of the implied normal on Oh in (3.2[) and 


the normal kernel in (3.3) greatly simplify posterior inference. Any Markov chain Monte 


Carlo (MCMC) scheme for DP mixture model as described, for example, in Neal (2000), 
can be applied. In our implementation, we used an implementation based on the finite DP 


(Ishwaran and James, 2001), which truncates the infinite sum in the DP mixture model 
after a hnite number of terms. We used if = 10, following a recommendation based on 


Theorem 1 in Ishwaran and James (2002) that gives tight bounds on the approximation 
error, well below what is clinically relevant in this application. Details of the MCMC 
implementation are presented in Supplement A. 

We carried out a preliminary simulation study to better understand the nature and 
accuracy of posterior inference under the proposed model for a reasonable sample size. 
The simulation setup and results are summarized in Supplement B. The inference under 
the proposed method incorporating the stochastic ordering constraint performed well, 
indicating small bias even with moderate sample size. 


4 Trial Simulation Study 

To assess average behavior of the proposed BNP trial design, we performed an extensive 
simulation study under a variety of scenarios that were constructed to mimic the Progel 
trial. For the proposed stopping and decision rules, we fixed the parameters as = 
0.9, = 0.05, based on preliminary studies (described later) and examining the OCs of 

the proposed BNP design. In all scenarios, we set the maximum number of patients to be 
N = 48, randomized equally between the control and treatment group, with cohorts of 
16 patients. The smallest clinically meaningful improvement used to define the decision 
criterion p{eu, Yn) was determined by our clinical collaborator (RM) to be eu = 18, given 
the numerical utilities of lAL resolution times in Table 1. 

We considered nine scenarios, and simulated 100 trials for each scenario. The response 
outcomes Yji were generated from the simulation truth G° shown in the last column of 
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Table 1^ Other columns in the same table show the true utilities 17° = f u{y) dG°{y) for 
each arm and the differences U° — Uq. 

To calculate type I error and power, we dehne the null hypothesis Hq : Gi = Gq. 
Under the proposed design, the test rejects Hq in favor of Progel if Y^) > 
interimly with early stopping at n=16 or 32, and if ^{eu, Yn) > 0.5 for the terminal rule 
at = 48. Similarly, the test fails to reject Hq if rf^eu, Y^) < (n = 16, 32), with early 
stopping for futility, and if rj{eu, Y^) < 0.5 at = 48. 

We hxed the hyperparameters as described earlier in Section and £t the proposed 
BNP model (3.3) to each simulated data set. Table [^summarizes the OCs of the proposed 
BNP utility-based design for nine scenarios. The OCs include the average number of 
patients treated, type I error, the probabilities of making the correct decision (PCD), 
stopping the trial early due to either superiority, Pr(EarS), or futility, Pr(EarF) and, 
in a hnal analysis without early stopping, declaring superiority, Pr(FinS), or futility 
Pr(FinS). For comparison we also implemented inference under a variation of the model 


without the stochastic ordering constraint, that is, model (3.2) with an unconstrained 
normal N{6Qh \ 6*i/i,r^) replacing the truncated normal in (3.2). Scenarios la and 2a 
show summaries of inference under this unconstrained version of the model, using the 
same simulation truths as in scenarios 1 and 2. 

Details of the simulation results are discussed in Supplement C. Scenarios 1 and 2 
are null scenarios; in Scenario 3 we assumed a large treatment effect of U° — Uq = 41.6, 
far beyond eu = 18; Scenario 4 has a small treatment effect of U° — Uq = 19.2, barely 
beyond eu; under Scenarios 5 and 6 we assumed a moderate treatment effect; and the last 
three Scenarios 7, 8, and 9 have simulation truths different from the assumed mixture of 
normal distributions. 

For reference we also evaluated summaries related to estimation. Denote the true 
utility difference under the simulation truth by AU* = U° — Uq, and AU = Ui — Uq. For 
each scenario, we computed the estimation bias E{AU — AU*} and root mean squared 
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Table 2: In each scenario, the models G° in the right column are the simulation truths. 
Here a = 0.3 and Exp(-), Weib(-, •) denote an exponential distribution and a Weibull 
distribution, respectively. U° reports the expected utilities under the simulation truth 
G°. The second column reports the true difference U° — Uq. 


Scenario 

U? - US 

Group 

c; 

Simulation truth G° 

1 

0 

Progel 

- 

Resample from historical data 


Control 

- 

Resample from historical data 

2 

0 

Progel 

23.44 

0.1(5o + 0.63iV(2, + 0.27iV(3, 


Control 

23.44 

0.15o + 0.63iV(2, d^) + 0.27iV(3, d^) 

3 

41.64 

Progel 

57.25 

0.3(5o + 0.49iV(l, d^) + 0.14iV(2, d^) + 0.07iV(3.5, d^) 



Control 

15.61 

O.Wo + 0.63iV(2.5, d^) + 0.18iV(3, d^) + 0.09iV(4.5, d^) 

4 

19.15 

Progel 

48.94 

0.2(5o + 0.56iV(1.5, d^) + 0.24iV(2, d^) 



Control 

29.79 

0. Wo + 0.63iV(1.8, d^) + 0.27iV(3, d^) 

5 

29.68 

Progel 

64.82 

0.4^0 + 0.48iV(l, d^) + 0.12iV(2.5, d^) 



Control 

35.14 

0. Wo + 0.54iV(1.5, d^) + 0.18iV(2.5, d^) + 0.18iV(3.5, d^) 

6 

34.15 

Progel 

60.80 

0.45o + 0.36iV(l, d^) + 0.12iV(2, d^) + 0.12iV(3, d^) 



Control 

26.65 

0. Wo + 0.36iV(1.5, d^) + 0.54iV(3.5, d^) 

7 

43.47 

Progel 

55.33 

0.3(5o + 0.3Ea;p(l) + 0.4Ea;p(0.5) 



Control 

11.86 

O.Wo + 0.4iV(3, 0.22) ^ o.5iV(4, 0.2^) 

8 

8.13 

Progel 

45.08 

0.2(5o + 0.4Weib(l, 2) + 0.4Weib(0.7, 2) 



Control 

36.95 

0.2(5o + 0.5iV(1.8, 0.22) ^ o.3iV(2.5, 0.32) 

9 

10.25 

Progel 

55.33 

0.3(5o + 0.3Ea;p(l) + 0.4Ea;p(0.5) 



Control 

45.08 

0.2(5o + 0.4Weib(l, 2) + 0.4Weib(0.7, 2) 
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Table 3: Trial simulation results. MSS = mean sample size, TIE = type I error, PCD 
= probability of making the correct decision, Pr(EarS) = probability of stopping early 
due to superiority, Pr(EarF) = probability of stopping early due to futility, Pr(FinS) 
= probability of declaring superiority in a final analysis without early stopping, and 
Pr(FinF) = probability of declaring futility in a final analysis without early stopping. 
All probabilities are computed by repeated simulations. 


Scenario 

TT° — TT° 

MSS 

TIE 

PCD 

Pr(EarS) 

Pr(EinS) 

Pr(EarF) 

Pr(EinF) 

1 

0 

16.80 

0.00 

1.00 

0.00 

0.00 

1.00 

0.00 

la 

0 

16.32 

0.00 

1.00 

0.00 

0.00 

1.00 

0.00 

2 

0 

28.80 

0.02 

0.98 

0.01 

0.01 

0.64 

0.34 

2a 

0 

28.00 

0.01 

0.99 

0.00 

0.01 

0.69 

0.30 

3 

41.64 

29.12 

- 

1.00 

0.81 

0.18 

0.01 

0.00 

4 

19.15 

40.16 

- 

0.63 

0.15 

0.48 

0.15 

0.22 

5 

29.68 

31.68 

- 

0.93 

0.60 

0.33 

0.04 

0.03 

6 

34.15 

29.92 

- 

0.94 

0.65 

0.29 

0.05 

0.01 

7 

43.47 

28.64 

- 

0.96 

0.77 

0.19 

0.04 

0.00 

8 

8.13 

34.08 

- 

0.79 

0.04 

0.17 

0.45 

0.34 

9 

10.25 

34.88 

- 

0.74 

0.10 

0.13 

0.34 

0.43 


error (RMSE) — AP*)^}, where the expectation is over repeated simulations 

under each scenario. The results are given in Table SI. 

The OCs under all nine scenarios are given in Tableand show a favorable evaluation 
of the proposed design. The results under scenarios la and 2a, compared to scenarios 1 
and 2, show that the design’s frequentist OCs appear to be robust with respect to the 
inclusion or not of the constraint Gq < Gi in the prior. The simulation truths in scenarios 
7, 8, and 9 are different from the assumed mixture of normal distributions with equal 
variance. The results under these scenarios demonstrate the flexibility of BNP mixture 
models with a common variance parameter. In summary, inferences under the proposed 
BNP model and trial monitoring rules exhibit desirable OCs across all nine scenarios. 
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Parametric models and sensitivity analyses. For comparison, we implemented al¬ 
ternative inference under a parametric model assuming a zero-enriched Weibull distribu¬ 
tion, that is, a mixture of point mass at 0 and Weibull distribution. We assumed Tij 
GY, i = 1 ,..., Uj for groups j = 0 and j = 1 , using GY = + (1 ~ 7 rj)Weib(Aij, X 2 j). 

We completed the model with a prior p( 7 ij) = Beta(0.1,0.1) and a conjugate prior 
p{^ 2 j) = InvGa( 6 ij, 62 j)- The hyperparameters bij and b 2 j were determined by matching 
the prior mean of X 2 j with a maximum likelihood estimate and assuming a prior variance 
of 10. Finally, for Aij there is no conjugate prior. We followed Fink ( |1997 ) by assuming 
p{Xij) oc X^Y exp{-a2jXij - ^), with = 1 , 02 ^ = ^ogiHYiTij) + 2, and asj = 2, 
J = 0 , 1 . 


Table shows the OCs comparing the inferences under the proposed model versus 
the zero-enriched parametric Weibull model in some (arbitrarily) selected scenarios. The 
proposed BNP model with stochastic ordering compares quite favorably, with much larger 
probabilities of making a correct decision and correctly stopping early for superiority. 

Finally, we carried out an alternative analysis to understand how much the results 
might change if different utilities u{t) were elicited. Table S2 in the supplement presents 
the results of a sensitive analysis using different utilities. In summary, while the actual 
decisions naturally change, the frequentist OCs change only slightly. Different decisions 
are desirable, under different utilities that reflect different clinical preferences. 

A hnal set of simulations explored robustness with respect to the decision boundaries 
and for the continuation decision. Table S3 summarizes OCs under Scenarios 2 , 3, 
and 4. Again, while some summaries, like the probability of early stopping for futility, 
change in the expected direction, the nature of the overall comparison across scenarios 
remains unchanged under different criteria. 


5 Conclusions and Discussion 

We developed a Bayesian nonparametric (BNP) utility-based group sequential design to 
compare Progel with standard care in resolving air leaks after lung surgery. In this setting. 
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Table 4: Comparisons in selected scenarios under the proposed BNP model with stochas¬ 
tic ordering (BNPSO) versus an alternative parametric model with a zero-enriched 
Weibull (Z-Weib). 


Scenario 

True Diff 

Group 

MSS 

PCD 

Pr(EarS) 

Pr(FinS) 

Pr(EarF) 

Pr(FinF) 

3 

41.64 

BNPSO 

29.12 

1.00 

0.81 

0.18 

0.01 

0.00 



Z-Weib 

44.32 

0.65 

0.10 

0.52 

0.04 

0.34 

4 

19.15 

BNPSO 

40.16 

0.63 

0.15 

0.48 

0.15 

0.22 



Z-Weib 

42.40 

0.17 

0.03 

0.14 

0.18 

0.65 

6 

34.15 

BNPSO 

29.92 

0.94 

0.65 

0.29 

0.05 

0.01 



Z-Weib 

40.00 

0.85 

0.28 

0.57 

0.02 

0.13 

7 

43.47 

BNPSO 

28.64 

0.96 

0.77 

0.19 

0.04 

0.00 



Z-Weib 

41.92 

0.87 

0.20 

0.65 

0.04 

0.11 

8 

8.13 

BNPSO 

34.08 

0.79 

0.04 

0.17 

0.45 

0.34 



Z-Weib 

45.28 

0.42 

0.06 

0.50 

0.05 

0.39 


standard statistical tests or parametric models are not appropriate for trial designs or to 
describe air leak resolution time distributions. We solved the problem by developing a 
BNP model with a stochastic ordering constraint and proposing a trial design based on 
expected utility, computed from elicited utility values. The model assessment and trial 
simulation studies show unbiased results and desirable OCs. 

Beyond the application discussed in this paper, the proposed BNP utility-based 
method can be extended to many other contexts. For example, in applications that in¬ 
volve multiple groups, one may replace the truncated bivariate normal base measure M* 


in (3.3) with a truncated multivariate normal distribution that incorporates the desired 
stochastic ordering constraints. Furthermore, the hypothesis testing framework discussed 
in Section]^ can be extended easily to testing equalities in multiple distributions that are 
stochastically ordered. 

Finally, we note that the BNP model could be replaced by a sufficiently flexible 
parametric model without any substantial change in the performance of the proposed 
design. For example, one could use a mixture of if = 5 normals as the model. However, 
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the computational effort for posterior simulation in any finite mixture of normal model 
is nothing less than in the proposed DDP model. We prefer the BNP model for reasons 
of conceptual clarity and, in principle, natural scaling to larger sample sizes and greater 
precision. 
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